INDENG 290¶

Assignment 3¶

Author: Gopal Kumar

In [79]:
import pandas as pd
import numpy as np
pd.options.mode.chained_assignment = None  # default='warn'

Problem statement. The goal of this homework is to familiarize students with their choice of either time series prediction methods or generative adversarial networks via performing mini research projects. To get full credit, you need to submit solution to either Problem A (time series prediction methods) or Problem B (generative adversarial networks) described below - not both! Both Problems A and B assume some independent reading and literature research - lecture notes as well as papers provided in the syllabus will be helpful; Problem B assumes independent study of Pytorch or Tensorflow packages for training neural networks (it is only required for extra credit in Problem A). Below are the instructions how to install Pytorch locally and some Pytorch tutorials; similarly, instructions how to install Tensorflow locally and some Tensorflow tutorials.

Problem A. Financial time series prediction (100 points)

  • Get daily stock data for AMZN, GOOG, AAL, NCLH from Yahoo Finance for 2016, 2017, 2018, 2019, 2020. You can find examples of using yfinance library here here
In [14]:
tickers_list = ['AMZN', 'GOOG', 'AAL', 'NCLH']
tickers_df = {}
for ticker in tickers_list:
    tickers_df[ticker] = yf.download(tickers=[ticker], start="2016-01-01",end="2020-12-31", interval="1d")
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
In [33]:
tickers_df['AMZN'].tail()
Out[33]:
Open High Low Close Adj Close Volume
Date
2020-12-23 160.250000 160.506500 159.208496 159.263504 159.263504 41876000
2020-12-24 159.695007 160.100006 158.449997 158.634506 158.634506 29038000
2020-12-28 159.699997 165.199997 158.634506 164.197998 164.197998 113736000
2020-12-29 165.496994 167.532501 164.061005 166.100006 166.100006 97458000
2020-12-30 167.050003 167.104996 164.123505 164.292496 164.292496 64186000
In [ ]:
 

Predict daily stock volumes.

1. (10 pts) Plot time series for stock volumes and close prices for the above time periods. List observations of the data patterns - what kind of properties should a model have in order to be able to predict stock volumes and close prices well? Comment on the distributional shift observations in 2020 - how would you enhance your models for 2020 to improve performance?

In [31]:
fig, ax = plt.subplots(figsize=(13, 5))
for ticker in tickers_list:
    tickers_df[ticker]['Volume'].plot(label=ticker)
plt.legend(loc='best')
plt.title("Stock Volumes over time")
plt.xlabel("Date")
plt.ylabel("Stock Volumes")
plt.show()
In [58]:
for ticker in tickers_list:
    tx = tickers_df[ticker][['Volume']]
    idx = pd.date_range('2016-01-01', '2020-12-31', freq='D')
    tx = tx.reindex(idx, method='ffill').fillna(method='bfill')
    tx['dayofyear'] = tx.index.dayofyear
    tx['year'] = tx.index.year
    tx = tx.reset_index()
    tx = tx.pivot_table(index='dayofyear',columns='year',values='Volume')
    tx.plot(figsize=(13, 5))
    plt.legend(loc='best')
    plt.title(ticker+": Stock Volumes by year")
    plt.xlabel("Day of year")
    plt.ylabel("Stock Volumes")
In [32]:
fig, ax = plt.subplots(figsize=(13, 5))
for ticker in tickers_list:
    tickers_df[ticker]['Adj Close'].plot(label=ticker)
plt.legend(loc='best')
plt.title("Closing price over time")
plt.xlabel("Date")
plt.ylabel("Closing price")
plt.show()
In [59]:
for ticker in tickers_list:
    tx = tickers_df[ticker][['Adj Close']]
    idx = pd.date_range('2016-01-01', '2020-12-31', freq='D')
    tx = tx.reindex(idx, method='ffill').fillna(method='bfill')
    tx['dayofyear'] = tx.index.dayofyear
    tx['year'] = tx.index.year
    tx = tx.reset_index()
    tx = tx.pivot_table(index='dayofyear',columns='year',values='Adj Close')
    tx.plot(figsize=(13, 5))
    plt.legend(loc='best')
    plt.title(ticker+": Closing price by year")
    plt.xlabel("Day of year")
    plt.ylabel("Stock Volumes")
In [ ]:
 
In [ ]:
 
In [ ]:
 

2. (30 pts) Using $N$-day sliding window, use $N$-day average and $N$-day median methods to predict daily stock volumes for $N+1$ st day in 2019 and 2020 for $N=10,30,60$, namely:

$ \begin{array}{lr} y_{N+1}= & \frac{y_1+y_2+\ldots+y_N}{N} \\ y_{N+1}= & \text { median }\left(y_1, y_2, \ldots, y_N\right) \end{array} $

Analyze prediction error compared to realized volumes on the same days: compute average mean square error by month. Also, calculate mean square error for banking holidays vs ordinary business days. Do you observe any patterns which $N$ works best? Can you comment why? Do you see any difference across different stocks? Elaborate on your findings. You'll likely notice that mean square error will be smaller for ordinary business days than for banking holidays. You'll also likely notice increase in mean square error during the distributional shift due to the Covid shock in 2020.

In [ ]:
 
In [175]:
squared_error = {}
squared_error[2019] = []
squared_error[2020] = []
for ticker in tickers_list:
    for window_size in [10,30,60]:
        for year in [2019, 2020]:
            df_ = tickers_df[ticker][['Volume']]
            df_.columns = ['Actual Volume']
            df_.loc[:,'year'] = df_.index.year
            df_ = df_[df_.year == year]
            df_ = df_[['Actual Volume']]
            df_ = df_.rolling(window = window_size).mean()
            df_['Predicted Volume'] = df_['Actual Volume'].shift(1)
            df_.plot(figsize=(6, 3))
            plt.legend(loc='best')
            plt.title(ticker+": Volume forecast using "+str(window_size)+"-day rolling mean", fontsize=10)
            plt.xlabel("Date")
            plt.ylabel("Volume")
            df_['Error'] = df_['Predicted Volume'] - df_['Actual Volume']
            df_[ticker+"_"+str(window_size)+"_"+str(year)]= np.square(df_['Error'])
            squared_error[year].append(df_[ticker+"_"+str(window_size)+"_"+str(year)])
C:\Users\dell\anaconda3\lib\site-packages\pandas\plotting\_matplotlib\core.py:386: RuntimeWarning: More than 20 figures have been opened. Figures created through the pyplot interface (`matplotlib.pyplot.figure`) are retained until explicitly closed and may consume too much memory. (To control this warning, see the rcParam `figure.max_open_warning`).
  fig = self.plt.figure(figsize=self.figsize)
In [176]:
# holidays on which NASDAQ is open
banking_holidays = [
    '2019-10-14', # Columbus Day
    '2019-11-11', # Veterans Day
    '2019-11-29', # Friday after Thanksgiving
    '2019-12-31', # Dec 31
    '2020-10-12', # Columbus Day
    '2020-11-11', # Veterans Day
    '2020-11-27', # Friday after Thanksgiving
    '2020-12-31', # Dec 31
]
In [177]:
for year in [2019,2020]:
    tx = pd.concat(squared_error[year],axis=1)
    for ticker in tickers_list:
        colnames= [col for col in tx.columns if ticker in col]
        tx_ = tx[colnames].copy()
        tx_.columns = ['10-day window','30-day window','60-day window']
        ty = tx_.copy()
        tx_ = tx_.groupby(pd.Grouper(freq='M')).mean()
        tx_ = tx_.rename(index=lambda x: x.strftime('%B'))
        tx_.plot(kind='bar',title=ticker+": Mean-squared error by month in "+str(year), figsize=(6,3))
        
        holidays_list_ = pd.to_datetime(banking_holidays)
        ty_holidays = ty.loc[ty.index.isin(holidays_list_)]
        ty_working = ty[~ty.index.isin(holidays_list_)]
        holidays = ty_holidays.mean(axis=0).to_frame()
        holidays.columns = ['Holidays']
        working = ty_working.mean(axis=0).to_frame()
        working.columns = ['Business Days']
        pd.concat([holidays,working],axis=1).T.plot(kind='barh',title=ticker+": Mean-squared error by business-days & holidays in "+str(year), figsize=(6,3))
In [ ]:
 
In [178]:
squared_error = {}
squared_error[2019] = []
squared_error[2020] = []
for ticker in tickers_list:
    for window_size in [10,30,60]:
        for year in [2019, 2020]:
            df_ = tickers_df[ticker][['Volume']]
            df_.columns = ['Actual Volume']
            df_.loc[:,'year'] = df_.index.year
            df_ = df_[df_.year == year]
            df_ = df_[['Actual Volume']]
            df_ = df_.rolling(window = window_size).median()
            df_['Predicted Volume'] = df_['Actual Volume'].shift(1)
            df_.plot(figsize=(6, 3))
            plt.legend(loc='best')
            plt.title(ticker+": Volume forecast using "+str(window_size)+"-day rolling median", fontsize=10)
            plt.xlabel("Date")
            plt.ylabel("Volume")
            df_['Error'] = df_['Predicted Volume'] - df_['Actual Volume']
            df_[ticker+"_"+str(window_size)+"_"+str(year)]= np.square(df_['Error'])
            squared_error[year].append(df_[ticker+"_"+str(window_size)+"_"+str(year)])
            
C:\Users\dell\anaconda3\lib\site-packages\pandas\plotting\_matplotlib\core.py:386: RuntimeWarning: More than 20 figures have been opened. Figures created through the pyplot interface (`matplotlib.pyplot.figure`) are retained until explicitly closed and may consume too much memory. (To control this warning, see the rcParam `figure.max_open_warning`).
  fig = self.plt.figure(figsize=self.figsize)
In [179]:
for year in [2019,2020]:
    tx = pd.concat(squared_error[year],axis=1)
    for ticker in tickers_list:
        colnames= [col for col in tx.columns if ticker in col]
        tx_ = tx[colnames].copy()
        tx_.columns = ['10-day window','30-day window','60-day window']
        ty = tx_.copy()
        tx_ = tx_.groupby(pd.Grouper(freq='M')).mean()
        tx_ = tx_.rename(index=lambda x: x.strftime('%B'))
        tx_.plot(kind='bar',title=ticker+": Mean-squared error by month in "+str(year), figsize=(6,3))
        
        holidays_list_ = pd.to_datetime(banking_holidays)
        ty_holidays = ty.loc[ty.index.isin(holidays_list_)]
        ty_working = ty[~ty.index.isin(holidays_list_)]
        holidays = ty_holidays.mean(axis=0).to_frame()
        holidays.columns = ['Holidays']
        working = ty_working.mean(axis=0).to_frame()
        working.columns = ['Business Days']
        pd.concat([holidays,working],axis=1).T.plot(kind='barh',title=ticker+": Mean-squared error by business-days & holidays in "+str(year), figsize=(6,3))
In [ ]:
 
In [ ]:

In [ ]:
 

3. ( $30 \mathrm{pts}$ ) Daily volumes are often forecast using linear autoregressive models. Using $N$-day sliding window, find coefficients $A, B, C$ in linear autoregressive models of lag 1 and lag 2 below to predict daily stock volumes for $N+1$ st day in 2019 and 2020 for $N=10,30,60$. Specifically:

$ \begin{array}{lr} y_{N+1}= & A y_N+B+\epsilon_{N+1} \\ y_{N+1}= & A y_N+B y_{N-1}+C+\epsilon_{N+1} \end{array} $

Do you think models of higher lag would be necessary? Why? Do you observe any patterns which $N$ works best? Do you see any difference across different stocks? Repeat mean square error analysis above and comment on your findings with regard to ordinary business days vs. holidays as well as the distributional shift in 2020.

In [ ]:
squared_error = {}
squared_error[2019] = []
squared_error[2020] = []
for ticker in tickers_list:
    for window_size in [10,30,60]:
        for year in [2019, 2020]:
            df_ = tickers_df[ticker][['Volume']]
            df_.columns = ['Actual Volume']
            df_.loc[:,'year'] = df_.index.year
            df_ = df_[df_.year == year]
            df_ = df_[['Actual Volume']]
            df_ = df_.rolling(window = window_size).median()
            df_['Predicted Volume'] = df_['Actual Volume'].shift(1)
            df_.plot(figsize=(6, 3))
            plt.legend(loc='best')
            plt.title(ticker+": Volume forecast using "+str(window_size)+"-day rolling median", fontsize=10)
            plt.xlabel("Date")
            plt.ylabel("Volume")
            df_['Error'] = df_['Predicted Volume'] - df_['Actual Volume']
            df_[ticker+"_"+str(window_size)+"_"+str(year)]= np.square(df_['Error'])
            squared_error[year].append(df_[ticker+"_"+str(window_size)+"_"+str(year)])
            
In [ ]:
 
In [188]:
df_
Out[188]:
Actual Volume
Date
2019-01-02 159662000
2019-01-03 139512000
2019-01-04 183652000
2019-01-07 159864000
2019-01-08 177628000
... ...
2019-12-24 17626000
2019-12-26 120108000
2019-12-27 123732000
2019-12-30 73494000
2019-12-31 50130000

252 rows × 1 columns

In [198]:
ticker = 'AMZN'
year = 2019
In [217]:
df_ = tickers_df[ticker][['Volume']]
df_.columns = ['Actual Volume']
df_.loc[:,'year'] = df_.index.year
df_ = df_[df_.year == year]
df_ = df_[['Actual Volume']]
In [218]:
df_
Out[218]:
Actual Volume
Date
2019-01-02 159662000
2019-01-03 139512000
2019-01-04 183652000
2019-01-07 159864000
2019-01-08 177628000
... ...
2019-12-24 17626000
2019-12-26 120108000
2019-12-27 123732000
2019-12-30 73494000
2019-12-31 50130000

252 rows × 1 columns

In [194]:
from statsmodels.tsa.ar_model import AutoReg
# from sklearn.metrics import mean_squared_error
In [219]:
window_size = 10
t = df_[window_size:]
In [220]:
tx
Out[220]:
Actual Volume
Date
2019-01-02 159662000
2019-01-03 139512000
2019-01-04 183652000
2019-01-07 159864000
2019-01-08 177628000
2019-01-09 126976000
2019-01-10 130154000
2019-01-11 93724000
2019-01-14 120118000
2019-01-15 119970000
2019-01-16 127338000
In [221]:
index = t.index[0]
In [222]:
tx = df_[:index]
In [223]:
ty = tx[:window_size]
X = ty['Actual Volume'].values
model = AutoReg(X, lags=2)
model_fit = model.fit()
print('Coefficients: %s' % model_fit.params)
Coefficients: [3.00013043e+07 2.79992030e-01 4.74079593e-01]
In [229]:
model_fit.predict(start=len(X),end=len(X), dynamic=False)
Out[229]:
array([1.20537441e+08])
In [207]:
tx
Out[207]:
Actual Volume
Date
2019-01-02 159662000
2019-01-03 139512000
2019-01-04 183652000
2019-01-07 159864000
2019-01-08 177628000
2019-01-09 126976000
2019-01-10 130154000
2019-01-11 93724000
2019-01-14 120118000
2019-01-15 119970000
2019-01-16 127338000
In [197]:
i = 0

for index, row in df_.iterrows():
    tx = df_[index:]
    tx = tx[:window_size]
    X = tx['Actual Volume'].values
    model = AutoReg(X, lags=2)
    model_fit = model.fit()
    print('Coefficients: %s' % model_fit.params)

    i = i + 1
Coefficients: [3.00013043e+07 2.79992030e-01 4.74079593e-01]
Coefficients: [4.12654779e+07 3.24779391e-01 3.22071885e-01]
Coefficients: [3.94981072e+07 4.71758305e-02 5.52380141e-01]
Coefficients: [ 9.00110013e+07 -1.01521624e-01  2.86311734e-01]
Coefficients: [ 1.47694199e+08 -3.18947925e-01  3.77734604e-02]
Coefficients: [ 2.55230941e+08 -6.49179508e-01 -5.88686194e-01]
Coefficients: [ 2.59437083e+08 -5.73038259e-01 -7.29176234e-01]
Coefficients: [ 1.73877543e+08 -3.34969372e-02 -5.51836587e-01]
Coefficients: [ 1.22344848e+08  4.83486561e-02 -2.01168245e-01]
Coefficients: [ 1.33264721e+08 -4.26855980e-02 -2.57220987e-01]
Coefficients: [ 1.17995363e+08  9.03457582e-02 -2.11187813e-01]
Coefficients: [ 1.07667622e+08  1.15754878e+00 -1.10919715e+00]
Coefficients: [ 1.43977960e+08  1.19821629e+00 -1.47277428e+00]
Coefficients: [ 1.45399173e+08  1.19142001e+00 -1.46014198e+00]
Coefficients: [ 1.22008246e+08  6.32026750e-01 -5.66888535e-01]
Coefficients: [ 1.11031250e+08  6.64515246e-01 -5.41750411e-01]
Coefficients: [ 1.15510059e+08  6.84952872e-01 -5.83665364e-01]
Coefficients: [ 1.17642063e+08  6.70766884e-01 -5.74846201e-01]
Coefficients: [ 9.85007856e+07  6.67842781e-01 -4.85585609e-01]
Coefficients: [ 7.62929492e+07  6.13656473e-01 -3.37924586e-01]
Coefficients: [ 8.20179825e+07 -1.20722437e-02  6.19739344e-02]
Coefficients: [ 1.20969438e+08 -4.46750261e-01  4.47061076e-02]
Coefficients: [ 1.46714781e+08 -5.49048083e-01 -1.49978373e-01]
Coefficients: [ 1.34572398e+08 -5.06472723e-01 -6.45002341e-02]
Coefficients: [ 1.03687042e+08 -2.76727280e-01  2.49053888e-02]
Coefficients: [ 7.28043747e+07 -2.49213716e-01  2.86101233e-01]
Coefficients: [7.53308629e+06 2.70689508e-01 5.82707401e-01]
Coefficients: [2.06384699e+07 3.17945954e-01 3.53741125e-01]
Coefficients: [-4.13022956e+06  6.35468349e-01  3.69364943e-01]
Coefficients: [ 2.52733929e+07  6.54187663e-01 -4.95759844e-02]
Coefficients: [3.22243120e+07 1.45561307e-01 3.15705499e-01]
Coefficients: [ 8.50795677e+07 -5.88264416e-01  2.95409807e-01]
Coefficients: [-1.15326866e+06  1.27883246e+00 -1.64465748e-01]
Coefficients: [ 9.38033755e+07  9.89030350e-01 -1.37447708e+00]
Coefficients: [ 5.60192479e+07  4.66706881e-01 -1.86185341e-01]
Coefficients: [ 6.13791211e+07  4.53080944e-01 -1.97346453e-01]
Coefficients: [ 7.59707232e+07  3.80111610e-01 -2.64809746e-01]
Coefficients: [ 9.62922294e+07  2.13050582e-01 -3.23334039e-01]
Coefficients: [ 1.47800494e+08 -1.06019755e-02 -6.58918186e-01]
Coefficients: [ 1.29456095e+08  1.15132283e-01 -6.05664615e-01]
Coefficients: [ 7.12171254e+07  1.46046674e-01 -6.22290932e-02]
Coefficients: [ 1.74277927e+08 -6.62294707e-01 -4.00642881e-01]
Coefficients: [ 1.88274479e+08 -1.18476244e-01 -1.10417114e+00]
Coefficients: [6.25653320e+07 1.58928543e-01 2.03604498e-01]
Coefficients: [5.31156634e+07 2.47046544e-01 2.44991406e-01]
Coefficients: [5.62091405e+07 2.25808198e-01 2.64868732e-01]
Coefficients: [6.54383192e+07 1.38316993e-01 3.11242962e-01]
Coefficients: [ 9.58871080e+07 -4.59488875e-02  2.28217380e-01]
Coefficients: [ 1.79172292e+08 -4.14738356e-01 -1.12283508e-01]
Coefficients: [-4.26589733e+07  7.86061858e-01  5.25093190e-01]
Coefficients: [-8.13241224e+07  9.27163494e-01  6.99715212e-01]
Coefficients: [-2.48744282e+07  6.97149122e-01  4.41860657e-01]
Coefficients: [2.32606265e+07 6.79284445e-01 2.83718079e-02]
Coefficients: [1.77511725e+07 5.16520858e-01 2.17183803e-01]
Coefficients: [3.89367941e+07 3.85673279e-01 8.94070976e-02]
Coefficients: [4.66734652e+07 2.12807226e-01 1.51389242e-01]
Coefficients: [ 7.32702841e+07  1.23018668e-01 -1.09386069e-01]
Coefficients: [ 1.29721111e+08 -3.72533806e-01 -3.85825588e-01]
Coefficients: [ 1.40073837e+08 -4.44572658e-01 -4.55732773e-01]
Coefficients: [ 1.61515480e+08 -8.48337652e-01 -3.49428023e-01]
Coefficients: [-8.01110051e+07  9.74961929e-01  1.05153589e+00]
Coefficients: [1.91893115e+07 5.27492567e-01 1.73691509e-01]
Coefficients: [ 5.63738378e+07  8.26961918e-01 -6.49660097e-01]
Coefficients: [ 5.31300892e+07  5.21886349e-01 -3.22435846e-01]
Coefficients: [ 6.39648085e+07  5.38605202e-01 -5.16942535e-01]
Coefficients: [ 6.39143278e+07  5.06021813e-01 -5.18338012e-01]
Coefficients: [ 9.18172473e+07  1.79557026e-01 -6.44602368e-01]
Coefficients: [ 1.01821231e+08  4.96621283e-01 -1.07439856e+00]
Coefficients: [ 1.08053674e+08  4.75697174e-01 -1.16095572e+00]
Coefficients: [8.77744827e+05 2.34189541e-01 8.96193748e-01]
Coefficients: [-6.94736664e+07  1.25661540e+00  9.09272828e-01]
Coefficients: [ 8.00274713e+07  6.16844762e-01 -5.84687982e-01]
Coefficients: [ 7.58420561e+07  4.04028998e-01 -2.38818114e-01]
Coefficients: [ 8.45859176e+07  3.68213038e-01 -2.90019001e-01]
Coefficients: [ 9.41176632e+07  3.22671641e-01 -3.31176501e-01]
Coefficients: [ 1.11299641e+08  2.86066767e-01 -4.35040848e-01]
Coefficients: [ 1.25578875e+08  2.67524682e-01 -5.28078354e-01]
Coefficients: [ 1.08588692e+08  3.51439738e-01 -4.35794253e-01]
Coefficients: [ 1.00318514e+08  1.49530833e-01 -2.40170844e-01]
Coefficients: [ 7.70414545e+07  3.96527949e-01 -1.85827729e-01]
Coefficients: [ 7.08941826e+07  3.52725101e-01 -4.83697415e-02]
Coefficients: [ 9.63041365e+07  1.51224774e-01 -5.31065295e-02]
Coefficients: [ 1.69197476e+08 -3.53531047e-01 -2.36080800e-01]
Coefficients: [ 1.41750791e+08 -1.55341199e-01 -1.98735697e-01]
Coefficients: [ 1.09511323e+08 -6.62936797e-02 -5.35550990e-03]
Coefficients: [ 1.38587528e+08 -1.14932930e-01 -2.67013893e-01]
Coefficients: [6.48585668e+07 3.20957000e-01 1.72272839e-02]
Coefficients: [ 5.02576688e+07  7.98291670e-01 -3.39538708e-01]
Coefficients: [-1.26114697e+07  5.59350330e-01  4.83544301e-01]
Coefficients: [ 3.46346916e+07 -1.87145072e-01  7.00816916e-01]
Coefficients: [ 3.95087614e+07 -8.75750960e-02  5.62175745e-01]
Coefficients: [2.01326019e+07 1.52658881e-01 5.34559265e-01]
Coefficients: [ 6.26110098e+07 -1.21343667e-01  2.90423139e-01]
Coefficients: [ 9.43916229e+07 -4.41652980e-01  1.59847046e-01]
Coefficients: [ 1.51235984e+08 -9.40190438e-01 -1.00757031e-01]
Coefficients: [ 5.78047758e+07  7.01273484e-01 -3.11125929e-01]
Coefficients: [ 1.08030459e+08  3.48374013e-01 -5.87535645e-01]
Coefficients: [ 8.10526147e+07  3.34403863e-01 -2.11981227e-01]
Coefficients: [ 9.30567573e+07  3.63324267e-01 -3.42277094e-01]
Coefficients: [ 1.03182786e+08  2.91235323e-01 -3.39236844e-01]
Coefficients: [ 1.12040713e+08  2.47867543e-01 -3.68102414e-01]
Coefficients: [ 1.37302372e+08  1.79546931e-01 -5.22557104e-01]
Coefficients: [ 1.31894888e+08  2.45703828e-01 -5.74618092e-01]
Coefficients: [ 5.98905051e+07  4.12983788e-01 -1.68700292e-01]
Coefficients: [ 3.29487878e+07  8.28269956e-01 -2.59753513e-01]
Coefficients: [ 3.18045641e+07  9.00881564e-01 -3.39493590e-01]
Coefficients: [ 4.79874560e+07  8.67404298e-01 -4.97450096e-01]
Coefficients: [ 3.94823445e+07  7.03777430e-01 -3.14938100e-01]
Coefficients: [ 4.99995473e+07  2.70165483e-01 -8.00687532e-02]
Coefficients: [ 8.19872752e+07 -1.52319944e-01 -1.50341150e-01]
Coefficients: [ 1.09382737e+08 -4.99281926e-01 -2.72452598e-01]
Coefficients: [ 1.04223929e+08 -4.46917208e-01 -2.41543593e-01]
Coefficients: [ 8.56750056e+07 -3.90058767e-01 -1.27900964e-02]
Coefficients: [ 6.48773891e+07 -1.45293597e-01  5.67218328e-02]
Coefficients: [ 5.06903864e+07 -3.39903085e-02  1.43786256e-01]
Coefficients: [ 5.92163473e+07 -5.94034402e-02  3.93232207e-02]
Coefficients: [ 6.79646826e+07 -9.48215937e-02 -9.98370656e-02]
Coefficients: [ 7.59911373e+07 -1.97126563e-01 -2.35582788e-01]
Coefficients: [ 6.88332385e+07 -1.02804357e-01 -2.04580479e-01]
Coefficients: [ 8.74574776e+07 -2.16606776e-02 -6.78495326e-01]
Coefficients: [ 7.35690147e+07  3.84304808e-01 -7.12018564e-01]
Coefficients: [ 3.95786491e+07  9.06523368e-01 -5.29108750e-01]
Coefficients: [ 3.21990590e+07  9.92357708e-01 -5.05451055e-01]
Coefficients: [ 4.54960555e+07  1.09704983e+00 -8.46622298e-01]
Coefficients: [ 4.83644667e+07  7.90493274e-01 -5.09587635e-01]
Coefficients: [ 4.90360178e+07  6.60630001e-01 -3.76265847e-01]
Coefficients: [ 4.29783407e+07  7.45782584e-01 -3.93415845e-01]
Coefficients: [ 6.02625494e+07  7.16796969e-01 -5.81039720e-01]
Coefficients: [ 4.36756937e+07  6.95332997e-01 -3.74852599e-01]
Coefficients: [ 4.44489883e+07  4.14045660e-01 -1.52472138e-01]
Coefficients: [ 7.20065758e+07 -1.21587249e-01 -1.08052817e-01]
Coefficients: [ 5.63254474e+07  5.33332131e-02 -2.80249893e-02]
Coefficients: [ 1.06042636e+08 -2.59383989e-01 -5.32323046e-01]
Coefficients: [ 9.25467900e+07  7.46530880e-01 -1.23866672e+00]
Coefficients: [ 5.32383366e+07  9.36729544e-01 -7.30049728e-01]
Coefficients: [ 5.41672324e+07  1.01014578e+00 -8.49300347e-01]
Coefficients: [ 5.96841058e+07  4.86030446e-01 -2.90011207e-01]
Coefficients: [ 6.40096248e+07  5.08368592e-01 -3.38785665e-01]
Coefficients: [ 6.71240712e+07  4.61051445e-01 -2.70686763e-01]
Coefficients: [ 7.64307795e+07  3.65484155e-01 -1.95395179e-01]
Coefficients: [ 8.16554351e+07  3.29853844e-01 -2.15309876e-01]
Coefficients: [ 7.83246598e+07  3.29912242e-01 -1.78225424e-01]
Coefficients: [ 8.10353668e+07  3.94628015e-01 -2.84215929e-01]
Coefficients: [ 6.16966502e+07  5.49805681e-01 -2.23636760e-01]
Coefficients: [ 2.12471399e+07  1.06001967e+00 -3.36394977e-01]
Coefficients: [ 4.57340269e+07  1.12006332e+00 -6.37565988e-01]
Coefficients: [ 4.57329427e+07  1.09685244e+00 -6.22143321e-01]
Coefficients: [ 5.11117809e+07  7.50964854e-01 -4.18550658e-01]
Coefficients: [ 5.13096562e+07  7.81887954e-01 -4.60053190e-01]
Coefficients: [ 7.65891035e+07  7.24520370e-01 -7.61746761e-01]
Coefficients: [ 5.54469290e+07  8.67898275e-01 -6.82520667e-01]
Coefficients: [ 4.52875786e+07  9.72300730e-01 -6.52093570e-01]
Coefficients: [ 3.92006839e+07  1.02306483e+00 -6.30517727e-01]
Coefficients: [ 5.59451549e+07  1.14848606e+00 -9.79479026e-01]
Coefficients: [ 6.05682917e+07  3.19425717e-01 -3.22460464e-01]
Coefficients: [ 6.13343904e+07  1.79380208e-01 -1.94124004e-01]
Coefficients: [ 6.36086987e+07  1.58745959e-01 -2.44449884e-01]
Coefficients: [ 6.49292319e+07  1.46362880e-01 -2.49808104e-01]
Coefficients: [ 6.72206775e+07  1.29821168e-01 -2.31657820e-01]
Coefficients: [ 8.02744908e+07 -3.39265382e-02 -2.22732761e-01]
Coefficients: [ 1.06936434e+08 -2.89107179e-01 -3.87350995e-01]
Coefficients: [ 6.21981913e+07 -5.38660575e-02  1.06619750e-02]
Coefficients: [ 8.93480556e+07 -6.04751039e-01  7.09015612e-02]
Coefficients: [ 9.99687901e+07 -6.38399204e-01 -8.48305412e-02]
Coefficients: [ 7.97753722e+07 -4.96455639e-01  1.27377785e-01]
Coefficients: [ 6.96333638e+07 -3.76986019e-01  1.61150470e-01]
Coefficients: [ 4.93528517e+07 -2.73651124e-01  3.86826891e-01]
Coefficients: [ 4.85262484e+07 -3.83025667e-01  4.47653955e-01]
Coefficients: [ 8.93525875e+07 -8.51592318e-01  1.97096426e-01]
Coefficients: [ 1.00495037e+08 -8.39919161e-01 -2.15370138e-02]
Coefficients: [ 1.32747817e+08 -1.04131285e+00 -4.58311179e-01]
Coefficients: [ 8.49981676e+07 -6.47862773e-01 -1.68954465e-02]
Coefficients: [ 1.85769517e+08 -1.71981983e+00 -8.13356020e-01]
Coefficients: [ 7.52043534e+07 -2.89635061e-01 -1.16958552e-02]
Coefficients: [ 4.32458973e+07 -1.49529974e-01  4.93105981e-01]
Coefficients: [ 4.68097569e+07 -1.09929212e-01  4.55742635e-01]
Coefficients: [ 4.75889399e+07 -9.77210231e-02  3.98874704e-01]
Coefficients: [ 5.57983266e+07 -7.54664278e-02  3.02228376e-01]
Coefficients: [ 7.71600592e+07 -3.53234254e-01  2.89271425e-01]
Coefficients: [ 1.28748185e+08 -6.23378603e-01 -1.47785128e-01]
Coefficients: [3.75783028e+07 4.33889936e-02 3.92806863e-01]
Coefficients: [4.31820489e+07 3.65301008e-03 3.61408807e-01]
Coefficients: [5.58362067e+07 7.54821647e-02 5.29239019e-02]
Coefficients: [2.99271510e+07 3.78880470e-01 9.91867137e-02]
Coefficients: [ 4.23821785e+07  3.96106049e-01 -1.20062860e-01]
Coefficients: [3.32303501e+07 2.56775451e-01 9.62655712e-02]
Coefficients: [2.44817108e+07 4.26785695e-01 1.09553293e-01]
Coefficients: [ 4.32487874e+07  6.38442302e-01 -4.28569854e-01]
Coefficients: [ 4.94866526e+07  1.48554292e-01 -1.20796441e-01]
Coefficients: [ 7.13198073e+07 -3.83697592e-01 -2.88080434e-02]
Coefficients: [ 9.79174796e+07 -5.79967951e-01 -3.51344415e-01]
Coefficients: [ 7.66564688e+07 -4.15782219e-01 -6.73163851e-02]
Coefficients: [ 7.47135371e+07 -4.20713871e-01  2.72501669e-02]
Coefficients: [ 1.05733165e+08 -6.68043379e-01 -2.90751372e-01]
Coefficients: [ 1.42822237e+08 -9.13857175e-01 -7.36304151e-01]
Coefficients: [ 6.01282137e+07 -2.27863988e-01  4.48598956e-02]
Coefficients: [ 1.12464364e+08 -5.63014586e-01 -5.12638133e-01]
Coefficients: [ 2.26212379e+05  2.12270805e+00 -9.45406880e-01]
Coefficients: [ 1.84666549e+08  9.42349399e-01 -3.13658416e+00]
Coefficients: [ 7.97491566e+07  2.92271704e-01 -3.73531453e-01]
Coefficients: [ 7.79990819e+07  3.12007021e-01 -3.84791830e-01]
Coefficients: [ 7.97392863e+07  3.30923382e-01 -3.99872125e-01]
Coefficients: [ 7.99300516e+07  3.04095812e-01 -3.71681875e-01]
Coefficients: [ 8.65342778e+07  2.68865194e-01 -4.02418011e-01]
Coefficients: [ 7.18550398e+07  3.29318294e-01 -3.51163414e-01]
Coefficients: [ 4.29557990e+07  1.94272441e-01 -6.43217621e-02]
Coefficients: [ 3.97983787e+07  3.05821810e-01 -9.23214079e-02]
Coefficients: [ 4.71610463e+07  1.39712646e-01 -1.02698558e-01]
Coefficients: [ 5.22163280e+07  3.50474156e-01 -4.44866063e-01]
Coefficients: [ 4.12274870e+07  4.09070896e-01 -3.08954305e-01]
Coefficients: [ 6.26972926e+07  1.54412264e-01 -4.98441311e-01]
Coefficients: [ 8.88374068e+07 -2.42076019e-01 -7.20829802e-01]
Coefficients: [2.53990680e+07 7.11606439e-02 4.62949396e-01]
Coefficients: [1.29629731e+07 6.01496615e-02 8.02842548e-01]
Coefficients: [4.35455318e+07 2.64025278e-02 1.21797452e-01]
Coefficients: [4.79549893e+07 2.16266521e-02 6.94264530e-02]
Coefficients: [ 5.48047286e+07 -9.36959849e-02  8.99348231e-02]
Coefficients: [ 7.72532263e+07 -3.32429064e-01 -6.81620421e-02]
Coefficients: [ 9.20204623e+07 -4.83831160e-01 -1.53081319e-01]
Coefficients: [ 1.03003577e+08 -3.66934988e-01 -4.15179740e-01]
Coefficients: [ 7.08795142e+07  6.37788882e-03 -2.53830691e-01]
Coefficients: [ 7.56795460e+07  4.08653348e-02 -3.96902525e-01]
Coefficients: [ 9.34025315e+07 -5.18197696e-01 -1.13495463e-01]
Coefficients: [ 1.06214151e+08 -3.71078860e-01 -4.45023422e-01]
Coefficients: [ 1.09686039e+08 -3.73212156e-01 -4.60978208e-01]
Coefficients: [ 1.23470491e+08 -4.77420899e-01 -5.60105087e-01]
Coefficients: [ 1.19267227e+08 -4.49409886e-01 -5.26989315e-01]
Coefficients: [ 1.14950624e+08 -4.78183170e-01 -4.57525172e-01]
Coefficients: [ 1.15403688e+08 -4.32274671e-01 -5.52521015e-01]
Coefficients: [ 7.46792641e+07 -1.31702688e-01 -1.67167530e-01]
Coefficients: [ 5.41804603e+07  2.23577518e-01 -2.04643270e-01]
Coefficients: [ 5.89003192e+07 -1.21598995e-01  2.78600292e-02]
Coefficients: [ 5.32994165e+07 -9.69815014e-02  1.24138444e-01]
Coefficients: [2.81692500e+07 2.91878674e-01 2.38023706e-01]
Coefficients: [2.15686734e+07 3.47661661e-01 2.94868123e-01]
Coefficients: [ 3.87554171e+07  3.66981785e-01 -3.10979692e-02]
Coefficients: [ 3.75403111e+07 -3.74326928e-01  8.49481752e-01]
Coefficients: [ 6.27133026e+07 -5.14253377e-01  6.08591558e-01]
Coefficients: [ 1.57059457e+08 -5.89203227e-01 -9.21120667e-01]
Coefficients: [ 1.96380611e+08 -7.19996859e-01 -1.33145823e+00]
Coefficients: [ 2.08033636e+08 -5.71293894e-01 -1.59430875e+00]
Coefficients: [ 1.22105276e+08  1.75629198e-02 -7.16774430e-01]
Coefficients: [ 1.21261559e+08  1.22636376e-02 -6.55751620e-01]
Coefficients: [ 1.26167401e+08 -7.29668819e-03 -6.60813750e-01]
Coefficients: [ 1.15876443e+08  5.52912599e-02 -6.38803192e-01]
Coefficients: [ 1.25371184e+08  2.04270417e-01 -7.83174302e-01]
Coefficients: [ 1.42111527e+08 -2.06982452e-02 -6.37912062e-01]
C:\Users\dell\anaconda3\lib\site-packages\statsmodels\regression\linear_model.py:1671: RuntimeWarning: divide by zero encountered in double_scalars
  return np.dot(wresid, wresid) / self.df_resid
---------------------------------------------------------------------------
ZeroDivisionError                         Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_12932\4169033949.py in <module>
      6     X = tx['Actual Volume'].values
      7     model = AutoReg(X, lags=2)
----> 8     model_fit = model.fit()
      9     print('Coefficients: %s' % model_fit.params)
     10 

~\anaconda3\lib\site-packages\statsmodels\tsa\ar_model.py in fit(self, cov_type, cov_kwds, use_t)
    437             nobs = self._y.shape[0]
    438             k = self._x.shape[1]
--> 439             scale = nobs / (nobs - k)
    440             cov_params /= scale
    441         res = AutoRegResults(

ZeroDivisionError: division by zero
In [ ]:
 
In [ ]:
 

4. (30 pts) Propose a method to improve volume prediction for banking holidays - you might need to use data for 2016, 2017 and 2018 (and, perhaps, even earlier) for that. Repeat the mean square error analysis and justify why the method that you are proposing is superior to the above.

In [ ]:
 
In [ ]:
 

5. (10 bonus points) Use neural networks to improve daily volume forecast above. Training neural networks can be expensive, therefore, for the purpose of current exercise, we might not need to consider the entire two-year time period - pick a month or two and focus on improving forecast over classic time series models for that time period. When presenting your results, elaborate on the neural network architecture used, training data (eg., the choice of the size of training data), training details (hyperparameters used, etc), training loss, etc. Visualizations will be helpful. Were you able to "beat" the benchmark in prior exercise in terms of prediction error?

In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]: